智能论文笔记

Privacy-preserving machine learning in data-sharing processes is an ever-critical task that enables collaborative training of Machine Learning (ML) models without the need to share the original data sources. It is especially relevant when an organization must assure that sensitive data remains private throughout the whole ML pipeline, i.e., training and inference phases. This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data. Thus, organizations can share the data representation to increase machine learning models' performance in scenarios with more than one data source for a shared predictive downstream task.

translated by 谷歌翻译

In this work, we propose a framework relying solely on chat-based customer support (CS) interactions for predicting the recommendation decision of individual users. For our case study, we analyzed a total number of 16.4k users and 48.7k customer support conversations within the financial vertical of a large e-commerce company in Latin America. Consequently, our main contributions and objectives are to use Natural Language Processing (NLP) to assess and predict the recommendation behavior where, in addition to using static sentiment analysis, we exploit the predictive power of each user's sentiment dynamics. Our results show that, with respective feature interpretability, it is possible to predict the likelihood of a user to recommend a product or service, based solely on the message-wise sentiment evolution of their CS conversations in a fully automated way.

translated by 谷歌翻译

A machine learning model to identify corruption in México's public procurement contracts

Andrés Aldana , Andrea Falcón-Cortés , Hernán Larralde

分类：机器学习

2022-10-25

The costs and impacts of government corruption range from impairing a country's economic growth to affecting its citizens' well-being and safety. Public contracting between government dependencies and private sector instances, referred to as public procurement, is a fertile land of opportunity for corrupt practices, generating substantial monetary losses worldwide. Thus, identifying and deterring corrupt activities between the government and the private sector is paramount. However, due to several factors, corruption in public procurement is challenging to identify and track, leading to corrupt practices going unnoticed. This paper proposes a machine learning model based on an ensemble of random forest classifiers, which we call hyper-forest, to identify and predict corrupt contracts in M\'exico's public procurement data. This method's results correctly detect most of the corrupt and non-corrupt contracts evaluated in the dataset. Furthermore, we found that the most critical predictors considered in the model are those related to the relationship between buyers and suppliers rather than those related to features of individual contracts. Also, the method proposed here is general enough to be trained with data from other countries. Overall, our work presents a tool that can help in the decision-making process to identify, predict and analyze corruption in public procurement contracts.

translated by 谷歌翻译

这项工作调查了鲁棒优化运输（OT）的形状匹配。具体而言，我们表明最近的OT溶解器改善了基于优化和深度学习方法的点云登记，以实惠的计算成本提高了准确性。此手稿从现代OT理论的实际概述开始。然后，我们为使用此框架进行形状匹配的主要困难提供解决方案。最后，我们展示了在广泛的具有挑战性任务上的运输增强的注册模型的性能：部分形状的刚性注册;基蒂数据集的场景流程估计;肺血管树的非参数和肺部血管树。我们基于OT的方法在准确性和可扩展性方面实现了基蒂的最先进的结果，并为挑战性的肺登记任务。我们还释放了PVT1010，这是一个新的公共数据集，1,010对肺血管树，具有密集的采样点。此数据集提供了具有高度复杂形状和变形的点云登记算法的具有挑战性用例。我们的工作表明，强大的OT可以为各种注册模型进行快速预订和微调，从而为计算机视觉工具箱提供新的键方法。我们的代码和数据集可在线提供：https：//github.com/uncbiag/robot。

translated by 谷歌翻译